-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make rule removal depend on gap in output #22
Conversation
|
The task for tomorrow is then simple, let's fully work through an example based on the explanation by D.W. Put this example in the Implementation Overview. |
Okay so the difficulty seems to be that a simple approach of converting the rules to a binary feature space is not as easy as thought. Linearly dependent rules will not be guaranteed to show up. Maybe there is an algorithm to automatically find linear dependence while throwing away constraints or otherwise I need to find the bug in my implementation of D.W.'s suggestion. |
aac4e1d
to
6bdbd04
Compare
Improved accuracy slightly after ordering the rules by gap size in 67e3ec6. Before
After
|
It looks like improving the rule post-processing step does really improve regression performance. Probably, there is still something wrong which explains why the performance is still so poor. This also explains why earlier I noticed that the rule extraction method didn't affect outcomes so much. It looks now like it does for regression but not for classification |
After 65907ae:
|
When not simplifying the single rules, then @test length(S._process_rules(repeat(allrules, 34), algo, 9)) == 9 resulting in |
Localized the following bug in eb74b60. Happens only when the number of repeats is greater or equal to 34: julia> r1
SIRUS.Rule(TreePath(" X[i, 1] < 32000.0 "), [0.061], [0.408])
julia> r2
SIRUS.Rule(TreePath(" X[i, 1] ≥ 32000.0 "), [0.408], [0.061])
julia> r4
SIRUS.Rule(TreePath(" X[i, 2] ≥ 8000.0 "), [0.386], [0.062])
julia> dependent = S._linearly_dependent([repeat([r2, r1], 34); r4], A, B)
69-element BitVector:
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
⋮
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
1
1 Here, EDIT: Fixed by using |
This PR fixes multiple problems:
rank
calculation to avoid removing the wrong rules._feature_space
. The old calculation was wrong in some cases (test was added).Works towards #13. Maybe this PR already finishes #13 because the
max_rules=10
scores are very close to theStableForestRegressor
scores.Before
After